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Abstract 

Although rice yield has been doubled in nnost parts of the world since 1960s, thanks to the advancements in breeding 
technologies, the biological mechanisms controlling yield are largely unknown. To understand the genetic basis of rice 
yield, a number of quantitative trait locus (QTL) mapping studies have been carried out, but whole-genome QTL mapping 
incorporating all interaction effects is still lacking. In this paper, we exploited whole-genome markers of an immortalized F2 
population derived from an elite rice hybrid to perform QTL mapping for rice yield characterized by yield per plant and 
three yield component traits. Our QTL model includes additive and dominance main effects of 1,619 markers and all pair- 
wise interactions, with a total of more than 5 million possible effects. The QTL mapping identified 54, 5, 28 and 4 significant 
effects involving 103, 9, 52 and 7 QTLs for the four traits, namely the number of panicles per plant, the number of grains per 
panicle, grain weight, and yield per plant. Most identified QTLs are involved in digenic interactions. An extensive literature 
survey of experimentally characterized genes related to crop yield shows that 1 9 of 54 effects, 4 of 5 effects, 1 2 of 28 effects 
and 2 of 4 effects for the four traits, respectively, involve at least one QTL that locates within 2 cM distance to at least one 
yield-related gene. This study not only reveals the major role of epistasis influencing rice yield, but also provides a set of 
candidate genetic loci for further experimental investigation. 
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Introduction 

Given the paramount importance in sustaining food demand- 
ing, great efforts have been made in large scale genetic research 
and extensive breeding programs in almost all rice {Oryza sativa L.) 
producing countries [1,2]. Gains in rice yield in recent decades are 
mainly owed to advancements in breeding technologies including 
selection of cultivars with higher productivity and significant 
increase of agricultural inputs such as fertilizers and insecticides 
[3]. While global environmental degradation has limited further 
yield increase through more agricultural inputs, studying the 
underlying biological processes of rice yield, and transferring the 
knowledge gains into improvement in breeding and agronomic 
productivity have become the key for further increase of food 
production [4]. 

Rice yield is determined by several factors including the number 
of panicles per plant, the number of grains per panicle and grain 
weight. These component traits and tiie overall yield per plant 
exhibit continuous variation since they are influenced by multiple 
genetic factors named quantitative trait loci (QTLs) and other 
environmental factors. Genetic markers such as restriction 
fragment length polymorphisms (RFLPs) [5] and simple sequence 
repeats (SSRs) [6] have been utilized to identify QTLs for 
understanding genetic basis controlling rice yield [7-1 1]. A recent 
study on QTL mapping for rice yield derived a high density single 
nucleotide polymorphism (SNP) bin map from genomic sequences 



obtained using deep sequencing technology, and demonstrated 
that such high density SNP bin map enabled to identify more 
QTLs with higher location precision than the traditional approach 
based on RFLP and SSR markers [12]. However, these studies 
attempted to identify QTLs individually via single interval 
mapping [5] or composite interval mapping with a small scan 
window [13], which had limited power of detection, given that 
many agronomic traits are controlled simultaneously by multiple 
QTLs and influenced by environmental factors [14,15]. 

Whole-genome marker QTL mapping employs a multiple QTL 
model that includes all available markers and evaluates effects of 
these markers simultaneously [16-18]. Such approach overcomes 
the limitations of the traditional single marker-based QTL 
mapping methods [16]. However, when genetic interactions are 
considered, a multiple QTL model can have a huge number of 
variables, which makes model inference very challenging. Early 
methods for multiple QTL mapping usually rely on Markov chain 
Monte Carlo (MCMC) simulation to fit a Bayesian model [16-20], 
which is computationally intensive and unpractical when a large 
number of markers are considered. Recentiy, more efficient and 
accurate methods have been developed [21,22], which make 
whole-genome marker QTL mapping feasible. With whole- 
genome marker QTL mapping considering main effects and 
interactions of all additive and dominance effects simultaneously, 
contributions of numerous genetic effects to rice yield can be 
assessed. 
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Table 1. Estimated QTL effects from the full model for the number of panicles per plant. 





Lociliy)^ 




p-value' 


r 


LocUi j) 


%) 


p-value 


H 


(757 add, 757 add) 


-0.12(0.04) 


2.16x10"^ 


0.0030 


(104 dom, 732 add) 


-0.11(0.05) 


9.54x10"^ 


0.0012 


7 add' 220 dom) 


0.20(0.06) 


2.39x10"" 


0.0032 


(186 dom, 735 add) 


-0.20(0.06) 


1.72x10"" 


0.0039 


(lO add. 887_dom) 


-0.25(0.05) 


3.78x10"' 


0.0061 


(518 dom, 759 add) 


-0.27(0.06) 


1.64x10""* 


0.0075 


(18_add. 1407 dom) 


0.28(0.06) 


1.45x10"*^ 


0.0080 


(220 dom, 784 add) 


0.23(0.06) 


2.38x10"^ 


0.0045 


(20 add. 1026 dom) 


0.27(0.06) 


1.82x10"** 


0.0060 


(561 donr 828 add) 


-0.36(0.05) 


4.88x10"'^ 


0.0140 


(44_add. 532_dom) 


-0.27(0.05) 


1.16x10"' 


0.0071 


(861 add, 918 add) 


0.41 (0.05) 


1.11 xlO"'^ 


0.0182 


(69_add, 913_dom) 


0.21(0.05) 


2.39x10"^ 


0.0040 


(904 add, 1113_dom) 


0.33(0.05) 


3.50x10"'° 


0.0098 


(123 add, 1132 add) 


0.23(0.05) 


7.78x10"' 


0.0053 


(213 dom, 929 add) 


0.27(0.05) 


4.85x10"" 


0.0076 


(166_add, (684 add) 


0.46(0.04) 


<10"'^ 


0.0279 


(967_add, 1515 add) 


0.20(0.05) 


2.91 x10"= 


0.0040 


(186 add, 1372 add) 


-0.11(0.04) 


2.86x10"^ 


0.0013 


(908 dom, 994 add) 


0.90(0.05) 


<10"'^ 


0.0782 


(192 add, 580 add) 


-0.11(0.04) 


8.35x10"^ 


0.0013 


(1026 add, 1173 add) 


0.21 (0.05) 


9.05x10""* 


0.0044 


(199_add, 782 dom) 


-0.08(0.03) 


2.53x10"^ 


0.0006 


(1037_add, 1510 add) 


0.14(0.05) 


1.13x10"^ 


0.0018 


(208_add, 309 add) 


-0.31(0.05) 


1.50x10"' 


0.0092 


(1089 dom, 1096 add) 


0.34(0.12) 


2.17x10"^ 


0.0016 


(227 add, 364 dom) 


-0.36(0.05) 


3.18x10"'^ 


0.0145 


(1119_add, 1471 add) 


-0.19(0.04) 


1.29x10"^ 


0.0045 


(244 add, 1303 dom) 


-0.11(0.04) 


5.10x10"^ 


0.0012 


(229 dom, 1160 add) 


-0.11(0.04) 


5.70x10"^ 


0.0012 


(249 add, 41 7 dom) 


0.12(0.04) 


3.14x10"^ 


0.0015 


(1208 add, 1583 dom) 


-0.14(0.05) 


3.20x10"^ 


0.0015 


(333 add, 991 add) 


0.24(0.05) 


3.92x10"' 


0.0057 


(64 dom, 1223 add) 


-0.43(0.05) 


2.22x10"'' 


0.0191 


(335 add, 372 add) 


0.20(0.05) 


2.05x10"^ 


0.0041 


(1237 add, 1370 add) 


0.54(0.05) 


<10"'' 


0.0279 


(349_add, 1425_dom) 


-0.23(0.05) 


3.20x10""* 


0.0060 


(1334 add, 1576 add) 


0.22(0.05) 


2.94x10""* 


0.0049 


(354_add, 358 dom) 


-0.50(0.05) 


<10"i = 


0.0233 


(408 dom, 1356 add) 


-0.73(0.05) 


<10"' = 


0.0488 


(371 dom, 381 add) 


0.37(0.05) 


6.01 xlO"" 


0.0113 


(1065 dom, 1394 add) 


-0.17(0.05) 


1.55x10"" 


0.0026 


(421 add, 1079 add) 


-0.15(0.04) 


1.04x10"'' 


0.0023 


(981 dom, 1558 add) 


-0.79(0.05) 


<10"'' 


0.0735 


(456_,dd, 1282 add) 


0.38(0.05) 


9.99x10"'^ 


0.0167 


(1094_dom, 1558_add) 


0.21 (0.05) 


4.05x10"' 


0.0046 


(517_add, 1346 add) 


-0.10(0.04) 


8.46x10"^ 


0.0009 


(1217 dom,1615 add) 


0.37(0.05) 


5.88x10"'^ 


0.0144 


(520 add,595 dom) 


-0.37(0.05) 


4.03x10"" 


0.0109 


(54_dom, 1117_dom) 


0.27(0.05) 


3.96x10"' 


0.0052 


(15_dom, 534_add) 


0.47(0.05) 


<10"'' 


0.0246 


(627 dom, 681 dom) 


-0.25(0.05) 


1.05x10""* 


0.0048 


(649 add, 1364 add) 


-0.20(0.04) 


3.80x10"** 


0.0045 


(786 dom, 810 dom) 


0.15(0.05) 


1.11 xlO"^ 


0.0021 


Parameter(s) 








a = 0.5, fc = 0.5 








/J 








0.0035 
















0.1444 








If 








0.9405 









"^add: additive effect; dom: dominance effect. If / equals j, then it is a main effect, otherwise, it is an interaction between locus / and locus j. Total number of effects is 112, 
only 54 effects with a p-value <0.01 are listed in this table. 

'^he estimated marker effect is denoted by ^ and the standard deviation is denoted by s-^. 
*^P-value is obtained via f-test. 
'^Phenotypic variation explained. 
doi:1 0.1 371 /journal.pone.0087330.t001 



In this study, we applied our empirical Bayesian least absolute 
shrinkage and selection operator (EBlasso) method [21,22] to 
whole-genome QTL mapping for an elite indica rice hybrid, 
Shanyou 63 [7,23]. Our EBlasso model includes additive and 
dominance main effects of 1,619 markers, aU additive x additive 
interactions, additive x dominance interactions, dominance x 
additive interactions, and dominance x dominance interactions, 
with a total of more than 5 million possible effects. The 
quantitative traits considered in this study include yield per plant 
and three yield component traits, namely the number of panicles 
per plant, the number of grains per panicle and grain weight. We 
will demonstrate that our EBlasso identifies a number of QTLs, 
most of which are involved in digenic interactions, and coincide 
with or are close to experimentally investigated genes related to 
yield. 



Results 

Four quantitative traits including three rice yield component 
traits (the number of panicles per plant, the number of grains per 
panicle and grain weight) and overall yield per plant were 
analyzed using the EBlasso method. The full QTL model includes 
main additive and dominance effects of 1 ,6 1 9 markers and all their 
pair-wise interactions, with a total of A: = 5,242,322 variables (see 
the Materials and Methods section for the genetic map). To 
understand the performance gain of the full model, we also 
performed QTL mapping for the four traits with a QTL model 
including k = 3,238 main effects, which is referred to as the main 
effect model. 

We estimated the phenotypic variance explained by a particular 
QTL J as hj = - 2 > i~ 1; 2, k' , where var(«,) is the 
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Figure 1. Interaction network of 103 QTLs for the number of panicles per plant. The circle shows the bin map and columns indicate 
position of the makers (ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line 
indicates a main effect. IVlolecularly characterized genes related to yield are also labeled in the appropriate positions of the genome. 
doi:1 0.1 371 /journal.pone.0087330.g001 



variance of the coefficient of QTL j and the total phenotypic 
variance ff^ was estimated from the data. To estimate the total 
variance explained by aU identified QTLs, we refitted the data to 
an ordinary linear regression model that includes variables 
corresponding to the identified QTLs. The phenotypic values 
were predicted from the linear regression model as j', and the total 
phenotypic variance explained by all identified QTLs was 
calculated as 

, var(j)) varCP) 



QTL mapping for the number of panicles per plant 

The three-step cross validation (GV) procedure (detailed in the 
Materials and Methods section) for the fuU model identified the 
optimal pair of parameters as (a, A) = (0.5, 0.5) (Table SI in File 
SI). Using the optimal values of (a, h), the EBlasso algorithm 
shrunk most of k variables to zero and yielded a QTL model with 
1 1 1 nonzero effects. The statistical test, described in the Materials 
and Methods section, for each nonzero effect identified 54 
significant effects at a ^-value ^0.01 (Table 1). Among them, 
one was main additive effect, 18 were additive x additive 
interaction, 32 were additive xdominance interaction, and three 
were dominance x dominance interaction. The 54 effects involved 
103 QTLs and explained 94.05% of the total phenotypic variance. 



Table 2. Estimated QTL effects from the main effect model 
for the number of panicles per plant. 



locus^ 


ksf)" 


p-va\ue' 




3 add 


0.22(0.09) 


6.86x10"^ 


0.0088 


228 add 


-0.24(0.09) 


2.98x10"^ 


0.0125 


353_,dd 


-0.24(0.09) 


2.68x10"^ 


0.0126 


757 add 


-0.54(0.10) 


4.18x10"" 


0.0625 


818 add 


0.40(0.10) 


4.82x10"^ 


0.0300 


908_add 


0.31(0.09) 


4.78x10"" 


0.0206 


994_add 


0.54(0.11) 


3.53x10"' 


0.0524 


1363 add 


-0.26(0.09) 


2.29x10"^ 


0.0135 


461 dom 


0.30(0.10) 


1.92x10"^ 


0.0091 


861 dom 


-0.45(0.10) 


1.17x10"= 


0.0209 


Parameter(s) 




a=-0.01, b = 0.5 








-0.0600 




"l 




1.5260 




1? 




0.3976 





"add: additive effect; dom: dominance effect. Total number of effects is 10, all 
with a p-value <0.01. 

h"he estimated marl<er effect is denoted by ^ and the standard deviation is 
denoted by .s-^. 

*^P-value is obtained via f-test. 
'^Phenotypic variation explained. 
doi:1 0.1 371/journal.pone.0087330.t002 
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Table 3. Estimated QTL effects from the full model for the 
number of grains per panicle. 



Loci(i jV 




p-value' 


J 


(436_add, 436_add) 


6.79(0.98) 


1.58x10"" 


0.0846 


(10 d„^, 50 add) 


-8.74(1.41) 


1.05x10"' 


0.0695 


(875_add, 1156 dom) 


7.15(1.44) 


6.37x10"' 


0.0427 


(595 do,^, 1004 add) 


-12.78(1.88) 


3.34x10"" 


0.0853 


(381_dom, 1057_add) 


-8.82(1.33) 


9.40x10"" 


0.0801 


Parameter(s) 




a = 0.05, b = 0.1 








-0.3228 








156.7998 




it' 




0.4651 





"add: additive effect; dom: dominance effect. If / equals j, then it is a main effect, 
otherwise, it is an interaction between locus / and locus j. Total number of 
effects Is 5, all with a p-value <0.01. 

'^he estimated marker effect is denoted by ^ and the standard deviation is 
denoted by Sj^. 

*^P-value Is obtained via f-test. 
'^Phenotypic variation explained. 
doi:l 0.1 371 /journal.pone.0087330.t003 



We did a literature survey and found 99 genes with known 
genomic locations that had experimental evidence showing that 
they were related to rice yield and yield component traits. For each 
of the 103 QTLs, we identified genes from 99 experimentally 
investigated genes that were within 20 centi-Morgan (cM) distance 
and associated such genes with the QTL. In total, we found 58 
genes for 103 QTLs. For the ease of presentation, we organized 
QTLs within 20 cM distance into a group, which resulted in 51 
groups for 103 QTLs. These 51 QTL groups and associated genes 
are listed in Table S2 in File SI . It is seen that 36 groups of QTLs 
have at least one associated gene and the distances between QTLs 
and their associated genes are relatively small (median distance 
5.37 cM). Moreover, 21 QTLs involved in 19 of 54 effects locate 
within 2 cM distance to at least one gene influencing rice yield. 
The interaction network of the 103 QTLs and their associated 
genes are visualized in Figure 1. 

The three-step CV for the main effect model identified the 
optimal pair of parameters as (a, b) = ( —0.01, 0.5) (Table SI in File 
S 1), with which eight additive and two dominance effects involving 
ten QTLs were identified with a /(-value SO. 01 (Table 2). The ten 
effects totally explained 39.76% of the phenotypic variance, and 
nine of them had genes related to yield within 20 cM distance 
(median distance 9.29 cM) (Table S3 in File SI). Seven QTLs 
were identical to QTLs or within the QTL group identified from 
the full model (Bins 228, 353, 757, 861, 908, 994, 1363), and the 
other three (Bins 3, 461 and 818) were close to QTLs identified by 




Figure 2. Interaction networit of nine QTLs for tKie number of grains per panicle. The circle shows the bin map and columns indicate 
position of the makers (ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line 
indicates a main effect. Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome. 
doi:1 0.1 371/journal.pone.0087330.g002 
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the full model. Specifically, Bin3 was 3.97 cM away from Bin? 
identified from the full model; Bin461 was 8.29 cM away from 
Bin456 identified from the fuU model; and Bin818 was 6.15 cM 
away from Biii810 identified from the full model. Comparing the 
results obtained from the two models, we see that the full model 
idemified more QTLs, which included all those identified by the 
main effect model, and explained a much larger percentage of the 
phenotypic variance. 

QTL mapping for the number of grains per panicle 

The CV analysis identified the optimal pair of parameters (a, 
b) = (0.05, 0. 1) for the full QTL model for the number of grains per 
panicle (Table S4 in File SI), with which EBlasso identified five 
nonzero effects. AH of these nonzero effects were significant at a p- 
value SO. 01 (Table 3), including one main additive effect and four 
additive x dominance interactions. The five effects involved nine 
QTLs, and explained 46.5 1 % of the overall phenotypic variance. 
Eight of the nine QTLs have experimentally verified genes related 
to rice yield within 20 cM distance (median distance 4.86 cM) 
(Table S5 in File SI). Moreover, four of these QTLs involved in 
four effects locate within 2 cM distance to at least one yield-related 
gene. The interaction network of the nine QTLs and their 
associated genes are depicted in Figure 2. 

The same three-step CV for the main effect model identified the 
optimal parr of parameters {a, b) = { —0.4, 0.5) (Table S4 in File 
SI), with which five additive effects were identified, all having a. p- 
value <0.01 (Table 4). The five QTLs (Bins 43, 436, 877, 1006, 
1057) totally explained 41.48% of the phenotypic variance, and all 
had molecularly characterized genes related to rice yield within 

19 cM distance (median distance 1.59 cM) (Table S6 in File SI). 
AH five QTLs were identical or very close to the QTLs identified 
from the full model. Specifically, Bin436 and Bin 105 7 were 
identified in both models; Bin43 is 3.40 cM away from Bin50 
identified from the full model; Bin877 is 0.47 cM away from 
Bin875 identified from the full model; and BinlOOO is 0.72 cM 
away from Bin 1004 identified from the full model. Comparing the 
results obtained from the two models, we observed that although 
both models identified five effects, the full model identified four 
more QTLs and explained a slighdy larger percentage of 
phenotypic variance. Moreover, the main effect model identified 
five additive effects, but the full model identified QTLs with both 
additive and dominance effects. 

QTL mapping for grain weight 

The CV analysis determined the optimal (a, b) = {I, 1) (Table S7 
in File SI) for the full QTL model for grain weights. Using the 
optimal a and b, EBlasso yields a QTL model including 89 
nonzero effects, among which 28 effects were identified as 
significant at a /(-value ^0.01 (Table 5). Among them, one was 
a main additive effect, 10 were additive x additive, 15 were 
additive x dominance, and two were dominance x dominance 
interactions. The 28 effects involved 52 QTLs, and explained 
93.79% of the phenotypic variance. QTLs with a distance ^ 

20 cM were placed into a group, resulting in 32 groups, and 26 of 
the 32 QTL groups had at least one gene within 20 cM distance 
(median distance 5.06 cM) (Table S8 in File SI). Moreover, 15 
QTLs involved in 1 2 of 28 effects locate within 2 cM distance to at 
least one yield-related gene. The interaction network of the 52 
QTLs and their associated genes are shown in Figure 3. 

The CV analysis for the main effect model identified the 
optimal pair of parameters (a, b) = {l, 1) (Table S7 in File SI), with 
which 26 QTLs (19 additive and 7 dominance effects) were 
identified with a /(-value <0.01 (Table 6). The 26 QTLs totaUy 
explained 84.24% of the overall phenotypic variance, and 23 of 



Table 4. Estimated QTL effects from the main effect model 
for the number of grains per panicle. 





locus^ 




p-value' 




43_add 


-5.15(1.08) 


1.38x10""^ 


0.0454 


436 add 


8.06(1.03) 


6.02x10"" 


0.1190 


877 add 


3.64(1.05) 


3.21 xlO"" 


0.0225 


1006 add 


-8.00(1.15) 


1.48x10"" 


0.1036 


1057 add 


-7.51(1.11) 


3.55x10"" 


0.0973 


Parameter(s) 




a=-0.4, b = 0.5 




A' 




-0.6700 








171.21 








0.4148 




"add: additive effect; dom: dominance effect. Total number of effects is five, all 



with a p-value <0.01 . 

'^he estimated marker effect is denoted by ^ and the standard deviation is 
denoted by .v^. 

*^P-value Is obtained via f-test. 
''phenotypic variation explained. 
doi:1 0.1 371 /journal.pone.0087330.t004 

them had molecularly characterized genes related to rice yield 
within 16 cM distance (median distance 4.08 cM) (Table S9 in 
File SI). Twenty three of the 26 QTLs were identical to or within 
a QTL group identified from the full model, but three QTLs (Bins 
228, 843, and 894) do not correspond to any QTLs identified from 
the full model within 20 cM distance. Again, the full model 
identified more QTLs than the main effect model and the QTLs 
detected by the full model explained more phenotypic variance 
than those detected by the main effect model. 

QTL mapping for yield per plant 

The C V analysis determined the optimal parr of parameters (a, 
h) = {l, 1) for the fuU QTL model for rice yield (Table SIO in File 
SI). Using the optimal values of [a, b), EBlasso yielded four 
nonzero effects, all were significant at a /(-value ^0.01: one main 
additive effect, one additive x additive interaction, one additive 
X dominance interaction, and one dominance x dominance 
interaction (see Table 7). The four effects involved seven QTLs 
and explained 34.01% of the overall phenotypic variance. Five out 
of the seven QTLs have an experimentally verified gene within 
15 cM distance (median distance 2.21 cM) (Table SI 1 in File SI). 
Moreover, two QTLs involved in two of four effects locate within 
2 cM distance to at least one yield-related gene. The interaction 
network of the seven QTLs and their associated genes are 
described in Figure 4. 

The optimal pair of parameters determined by the CV analysis 
for the main effect model was (a, b) = (—0.5, 0. 1) (Table S 10 in File 
SI), with which four QTLs with a /(-value ^0.01 were identified 
(Table 8). The four QTL effects explained 23.79% of the 
phenotypic variance, and all had at least one gene within 17 cM 
distance (median distance 7.82 cM) (Table S12 in File SI). Two of 
the four QTLs (Binl014, Bin 105 7) were identical to the QTLs 
identified from the fuU model, but the other two QTLs do not 
correspond to any QTL identified from the fuU model within 20 
cM distance. Overall, although the full model did not detect all 
QTLs identified by the main effect model, it still detected more 
QTLs and explained more phenotypic variance. 
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Table 5. Estimated QTL effects from the full model for grain 
weight. 





Loci(iy)^ 




p-value' 




r 


(729_add, 729 add) 


1.02(0.07) 


<10"' = 




0.1548 


(37 add. 547 do„,) 


0.71 (0.09) 


1.29x10" 


-14 


0.0428 


(67_add. 772 add) 


-0.25(0.08) 


6.56x10" 


-4 


0.0047 


(96_add, 1117_dom) 


0.21(0.08) 


2.95x10" 


-3 


0.0035 


(119_add, 987_add) 


0.18(0.07) 


6.73x10" 


-3 


0.0024 


(151 add, 1262 add) 


-0.15(0.07) 


9.79x10" 


-3 


0.0018 


(71_dom. 184 add) 


-0.67(0.09) 


1.55x10" 


-13 


0.0374 


(210 add. 1400 add) 


0.19(0.08) 


7.79x10" 


-3 


0.0025 


(329 add. 727 dom) 


0.22(0.09) 


4.63x10" 


-3 


0.0040 


(310_d„m, 419_add) 


-0.21(0.05) 


1.86x10" 


-5 


0.0043 


(431_add. llll.add) 


0.35(0.08) 


1.01 xlO" 


-5 


0.0107 


(71 dom, 500 add) 


-0.76(0.08) 


<10"'^ 




0.0493 


(583 add. 1578 dom) 


0.35(0.08) 


9.50x10" 


-6 


0.0092 


(107_dom, 700 add) 


0.19(0.07) 


4.80x10" 


-3 


0.0035 


(708_dom, 714 add) 


-1.15(0.32) 


1.79x10" 


-4 


0.0076 


(818_add. 1100 add) 


0.26(0.08) 


3.93x10" 


-4 


0.0053 


(916_add, 1026 add) 


0.15(0.06) 


8.50x10" 


-3 


0.0019 


(472_dom, 920 add) 


-0.27(0.09) 


1.65x10" 


-3 


0.0058 


(18_dom. 955 add) 


-0.20(0.08) 


3.77x10" 


-3 


0.0033 


(971 add. 1461 add) 


0.27(0.08) 


4.75x10" 


-4 


0.0075 


(620 dom, ion add) 


-0.67(0.09) 


3.31 xlO" 


-12 


0.0336 


(1035 add. 1224 add) 


0.30(0.07) 


3.27x10" 


-5 


0.0081 


(1093 add, 1407_dom) 


0.44(0.08) 


2.13x10" 


-7 


0.0148 


(1 1 67_dom, 1 1 68_add) 


-0.47(0.16) 


1.37x10" 


-3 


0.0051 


(119_dom, 1375_add) 


0.61 (0.09) 


7.83 X 1 0 


-11 


0.0289 


(1397 add, 1505 add) 


0.41 (0.09) 


1.95x10" 


-6 


0.0119 


(247_dom, 1505 dom) 


-0.23(0.08) 


1.22x10" 


-3 


0.0032 


(647_dom, 796 dom) 


0.26(0.08) 


4.29x10" 


-4 


0.0044 


Parameter(s) 




a = l, b = 


1 








-0.0661 










0.5317 










0.9379 







"add: additive effect; dom: dominance effect. If / equals j, then it is a main effect, 
otherwise, it is an interaction between locus / and locus j. Total number of 
effects Is 90, only 28 effects with a p-value <0.01 are listed in this table, 
'^he estimated marker effect is denoted by ^ and the standard deviation is 
denoted by s^. 

*^P-value Is obtained via f-test. 
'^Phenotypic variation explained. 
doi:1 0.1 371 /journal.pone.0087330.t005 

Effect types and pleiotropic genes 

Among the five types of effects (main additive, main dominance 
effects, additive x additive, additive xdominance, and dominance 
X dominance interactions) considered in the EBlasso full models 
for four traits, no main dominance effects was detected, but several 
dominance x dominance interactions (one for rice yield, three for 
the number of panicles per plant, and two for grain weight) were 
identified. Many additive xdominance interaction effects were 
identified, including one for rice yield, 32 for the number of 
panicles per plant, four for the number of grains per panicle, and 
15 for grain weight. Phenotypic variance explained by a single 



effect is relatively small for all traits (Tables 1, 3, 5 and 7). For 
example, the largest effect has r = 7.82% (908_d„n,ijiance 
x994 ajaitivc) for the number of panicles per plant, 8.53% 
(595_<jommanceX 1004 additive) for the number of grains per panicle, 
15.48% (729_„ddidve) for grain weight, and 6.08% (1057_,jditiveX 
1 144_domiiian< e) for ylcld pcr plant. Each main effect detected by the 
main effect model also explained a small percentage of the total 
phenotypic variance. 

Many molecularly characterized genes related to yield are 
known to play pleiotropic roles in regulating grain productivity 
[31]. Without surprise, a number of such genes coincide with or 
close to the QTLs that were identified by our EBlasso for multiple 
traits, although they did not necessarily have pleiotropic effects. 
For example, gene Ghd7, OsNRAAIPS and DEP2 are close to 
several QTLs common for the four phenotypes, qSW5/GW5, 
OsEF3 and LOG are near the QTLs for three phenotypes except 
the number of grains per panicle, and Gnla, OsJAG, GS3, OsJMTl, 
OsSPLM, GW8/OsSPL16, SGLl are associated with QTLs for 
three phenotypes except yield per plant. Besides Ghd7, OsNRAMPS 
and DEP2, gene FZP, OsSDR, and 0.sFAD8 was near QTLs for 
both yield per plant and the number of grains per panicle; 14 
genes were close to QTLs for both the number of panicles per 
plant and the number of grains per panicle; and 62 other genes 
were associated with QTLs for both the number of grains per 
panicle and grain weight. While the pleiotropic effect of some 
genes have been reported [32], our QTL mapping results 
identified a number of genes associated with multiple phenotypes, 
implying their possible pleiotropic role worthy of further 
experimental investigation. Moreover, it is also possible that the 
QTLs we detected may be closely linked to unknown genes, 
which, if identified, will yield more insight into the molecular basis 
of phenotypes [1]. 

Discussion 

Due to its small genome and close relatedness with other grass 
crops, rice has served as a model plant for investigating genetic 
factors underlying crop productivity [33,34]. To date, more than 
600 rice genes have been experimentally cloned with related traits 
including yield, biotic and abiotic stresses, grain quality, plant 
architecture, fertility, etc. [2] . However, there is still a knowledge 
gap regarding the molecular basis of yield-related biological 
processes [1], suggesting the importance of systematic tools that 
can enable to understand functional role of genes [2,35] . In this 
study, we employed a multiple QTL model that included all 
additive and dominance main effects of 1,619 markers, and all 
their pair-wise interactions with a total of more than 5 million 
possible effects, and then applied our EBlasso algorithm to identify 
QTLs for four agronomic related traits of rice, including yield, the 
number of panicles per plant, the number of grains per panicle 
and grain weight. Our QTL mapping revealed a number of QTLs 
for four traits, most of which are involved in digenic interactions. 
Moreover, most of these QTLs have at least one experimentally 
cloned gene within 20 cM distance. 

The same set of markers in the recombinant inbred line (RIL) 
population where the "immortalized Eg" (IMF2) was derived from 
were used for QTL mapping, via a composite interval mapping 
method with a scan window size of five markers [12]. Upon 
development of the IME2 population, this dataset was obtained 
and the ANOVA method was applied to each pair of markers to 
identify both main and digenic interaction effects from 5,242,322 
possible effects [24]. The composite interval mapping identified 
zero, three (Bin40, Bin446 and Bin 1006), seven (Bins 49, 1 7 1, 439, 
729, 928, 1008, and 1266), and one (Binl007) QTLs for the 
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Figure 3. Interaction network of 52 QTLs for grain weigfit. The circle shows the bin map and columns indicate position of the makers (ticks in 
million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line indicates a main effect. 
Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome. 
dol:l 0.1 371/journal.pone.0087330.g003 



number of panicles per plant, the number of grains per panicle, 
grain weight and yield per plant, respectively. The ANOVA 
method detected thousands (1432, 2696, 3524 and 2251) of 
digenic interactions between two bins with a^-value SO. 001; and 
after those digenic interactions involving adjacent bins were 
merged, 115, 189, 238, and 204 effects were reported, respectively 
[24]. In contrast, our EBlasso method identified a reasonable 
number of effects and QTLs for each trait, and 35%-80% of 
identified effects for four traits involve at least one QTL that 
locates within 2 cM distance to at least one gene related to crop 
yield, which corroborates the reliability of the identified effects. 

The list of genes associated with the identified QTLs provides 
insight into rice yield with respect to yield component traits. First, 
the number of panick^s dc'pcnds on plant's abilit)' of producing 
tillers, which is under genetic, developmental and environmental 
influence. While previous composite interval mapping did not 
identify any significant effect with the same set of markers in an 
RIL poptjlation [12], we have identified a set of QTLs that have 
nearby genes known to regulate plant tillering. For example, 
among genes in Table S2 in File S 1 , A40C1 / SPA is the first gene 
characterized for rice tillering; it initiates axillary buds that grow 
into lateral braches [36]. OsTBl/FCl has been identified as an 
important gene that negatively regulates lateral branching in rice 
[37]. OsSPL14 is a highly expressed gene in the shoot apex and 
primordial of primary and secondary branches, which promotes 
panicle branching while reducing tiller number [38]. Through 



gene mutations, D3, DIO, D14, D17/HTD1, and 2)27 were found 
to affect tiller initiation and/or outgrowth [37]. Secondly, the 
number of grains per panicle is another important trait 
determining crop yield. While composite interval mapping 
identified three QTLs (Bin40, Bin446 and Bin 1006) close to genes 
Gnla, GS3, OsNRAMPS and Ghd7, our EBlasso also identified these 
genes in addition to other 13 genes. Among them, F^Ph known to 
control spikelet meristem identity [39], Ghd7 is a pleiotropic gene 
affecting grain number, plant height and heading date [40], GW8/ 
OsSPLlG, PGL, and DEP2 all are known to be essential in 
regulating cell proliferation or elongation [41-43]. Thirdly, 
composite inter\'al mapping detected seven QTLs (Bins 49, 171, 
439, 729, 928, 1008, and 1266) for grain weight, with nearby 
genes Gnla, LAXl, GS3, GS5, qSW5/GW5, OsJMTl, OsIAA23, 
Ghd7, OsNRAMPS, TACl, LGDl and SGI. In addition to these 
genes, our EBlasso identified many other genes with known effects 
in controlling grain weight. For example, GIFl is a gene encoding 
a cell-waU invertase required for carbon partitioning during early 
grain filling, and overexpression of GIFl leads to larger and 
heavier grain weight [44] . Genes SRS3 and SRS5 have been found 
to regulate seed cell elongation [45,46] . Over-expression of LRKl 
gene restilts in enhanced cellular proliferation and increased grain 
weight [47] . Finally, yield per plant is the most complex trait and a 
small number of effects were identified compared with its 
component traits. While composite interval mapping identified 
only one QTL (Binl007) with nearby gene Ghd7 and OsNRAMPS, 
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Table 6. Estimated QTL effects from the main effect model 
for grain weight. 





locus" 


%)* 


p-value' 




37 add 


0.40(0.08) 


8.22x10"^ 


0.0262 


50 add 


0.21(0.08) 


3.14x10"' 


0.0072 


151_add 


-0.18(0.07} 


2.77x10"' 


0.0052 


173 add 


-0.49(0.07) 


6.46x10"" 


0.0418 


199_add 


0.23(0.06) 


1.51 xlO"" 


0.0077 


332_add 


0.30(0.06) 


1.26x10""^ 


0.0150 


440 add 


-0.98(0.06) 


<io-'= 


0.1670 


498_add 


-0.35(0.06) 


5.18x10"" 


0.0188 


710_add 


0.1 7(0.06) 


2.13x10"' 


0.0047 


729 add 


0.81(0.07) 


<10"" 


0.0968 


894 add 


-0.18(0.06) 


8.21 xlO"" 


0.0053 


936 add 


-0.44(0.07) 


3.11 x10"'° 


0.0291 


1008 add 


-0.34(0.06) 


6.63x10"" 


0.0177 


1110_add 


0.18(0.05) 


4.85x10"" 


0.0055 


1176_add 


-0.26(0.06) 


8.77x10""^ 


0.0108 


1251 add 


0.37(0.07) 


4.64x10"" 


0.0171 


1374_add 


0.33(0.06) 


2.55x10"^ 


0.0167 


1442_add 


-0.22(0.06) 


6.37x10"^ 


0.0079 


1565_add 


0.20(0.06) 


2.33x10"" 


0.0063 


38 dom 


0.29(0.07) 


3.18x10"^ 


0.0066 


228_dom 


-0.20(0.07) 


1.60x10"' 


0.0032 


312_dom 


0.17(0.06) 


3.20x10"' 


0.0024 


441_dom 


— 0.23(0.U/) 


7.95x10 " 


0.0043 


547_dom 


0.15(0.06) 


6.68x10"' 


0.0019 


843_dom 


0.16(0.06) 


5.41 xlO"' 


0.0021 


1506_dom 


-0.26(0.07) 


1.89x10"" 


0.0054 


Parameter{s) 




a=l, b=l 




A' 




0.1000 








0.5224 




P 




0.8424 





'^add: additive effect; dom: dominance effect. Total number of effects Is 38, only 
30 effects with a p-value <0.01 are listed In this table, 
'^he estimated marker effect is denoted by ji and the standard deviation is 
denoted by .v^. 

*^P-value Is obtained via f-test. 
*^Phenotypic variation explained. 
doi:1 0.1 371 /journal.pone.0087330.t006 

our EBlasso identified tliis QTL and six otlier QTLs, four of which 
have cloned gene within 15 cM distance (Table SII in File SI). 

In conclusion, taking advantage of the powerful EBlasso model 
for simultaneously accounting for more than 5 million possible 
effects, we identified a number of QTLs for four traits of the elite 
rice hybrid Shanyou 63, a vast majority of which are involved in 
digenic interactions. This set of QTLs not only shed light on the 
genetic basis of the yield of the rice hybrid, but also provide 
candidate loci for identification of new genes that may be involved 
in crop yield. 



Materials and Methods 

Plant materials and QTLs 

The genotype and phenotype data used in this study were 
obtained from previous studies [12,24]. The mapping plants were 
created by first crossing between indka rice Zhensha 97 and 
Minghui 63 [7] to produce the elite rice hybrid Shanyou 63 that 
was the most widely cultivated in China in 1980s -1990s [24]. 
Then a population of 240 Fg RILs was derived from single-seed 
descent of Shanyou 63. Next, an "immortalized Fg" (IMFj) 
population consisted of 278 crosses was created by intercrossing 
RILs for QTL mapping study [7,23]. The crossed population was 
field tested on the experimental farm of Huazhong Agricultural 
University in Wuhan, China, in 1999, for traits including yield per 
plant, the number of panicles per plant, the number of grains per 
panicle and grain weight. 

The RILs were genomic sequenced with an lUumina Genome 
Analyzer II using the bar-coded multiplexed sequencing approach 
as described in [25], and 270,820 high quality SNPs were 
identified. Bin maps were constructed by lumping consecutive 
SNPs with the same genotype into blocks, masking blocks with less 
than 250 kb to avoid false double recombinations, and merging 
recombination bins less than 5 kb, resulting in a map consisting of 
1,619 bins without missing data [12]. Genotypes of the IMFj 
crosses were deduced according to genotypes of their RIL parents 
[24] . The three genotypes in each bin were coded as A and B for 
each parental homozygote genotype and H for the heterozygote. 
Using the recombinant bins as QTLs, a 1,625.5 cM genetic 
linkage map was constructed with about 1.0 cM (230 kb) in length 
per bin (Figure 1). 

Bayesian Lasso linear regression model for multiple QTLs 

We employed a Bayesian Lasso (BLasso) multiple linear 
regression model to infer genotypes and quantitative trait 
associations. The regression model includes main additive and 
dominance effects of 1,619 SNP bins and all their pair-wise 
interactions. Letj, be the phenotypic value of a quantitative trait of 
the rth individual in a mapping population. In this study we 
observed jv,-, i- 1, n, of n = 278 individuals and collected them 
into a vector y=\yi, y^, yX^- In these ii individuals, let 

Table 7. Estimated QTL effects from the full model for yield 
per plant. 



Loci(;; jY 




p-value"^ 




(1014_add,1014 add) 


- 1 .94(0.37) 


1.95x10"' 


0.0544 


(113 add, 1547 add) 


-2.81(0.53) 


1.43x10"' 


0.0552 


(1057 add,l 144 dom) 


-2.89(0.53) 


5.57x10"" 


0.0608 


(743_dom.1043_d„m) 


3.21(0.52) 


8.38x10"'° 


0.0598 


Parameter(s) 




a=l, b=1 




A' 




-0.7521 




"l 




22.9734 








0.3401 





"add: additive effect; dom: dominance effect. If / equals then it is a main effect, 
otherwise, it is an interaction between locus / and locus j. Total number of 
effects is 4, all with a p-value <0.01. 

*1"he estimated marker effect is denoted by ^ and the standard deviation is 
denoted by .v^. 

*^P-value is obtained via f-test. 
''phenotypic variation explained. 
doi:1 0.1 371 /journal.pone.0087330.t007 
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Figure 4. Interaction network of seven QTLs for yield per plant. The circle shows the bin map and columns indicate position of the makers 
(ticks in million base pairs). The thickness of a link is proportional to the strength of the interaction effect. A short straight line indicates a main effect. 
Molecularly characterized genes related to yield are also labeled in the appropriate positions of the genome. 
doi:l 0.1 371/journal.pone.0087330.g004 



m = 1,619 denote the number of genetic markers genotyped whose 
main efTects include additive and dominance effects. Let the 
additive and dominance genotypes of marker j of individual i be 
XAij and Xoij, respectively, where x^^, takes on values +1,0 and — 1, 
and xdij takes on values 0, +1 and 0, corresponding to genotypes A, 
H and B, respectively. Let us definex^, = [x^,i>^^i2."%^^i>ji]^ and 
'^Gi = [xDi\,XDa,'",XDini\'^ ■ The interactions between any two 
effects are modeled as element-wise product of the corresponding 
main effects. Let x^„ x^, x£)^„ and x^^k be m{m—\)/2x\ 
vectors containingx^,y.-c^,/, XAij-XDif,XDfXAif, and XDij-Xoif, 
respectively, where y=l,--,OT— 1 and j'>j. Then we have the 
following linear regression model forji: 

y = H + XAPA+^DPD+^AAPAA+^ADPAD 

+ ^daPda + ^ddPdd + 

where fl is the population mean, vectors Pa and Pr, represent the 
main additive and dominance effects of all markers, respectively, 
and vectors Paa, Pad, Pda and Pod capture the additive x additive, 
additive x dominance, dominance x additive, and dominance x 
dominance interactions, respectively. Matrices = [x^i,x_^2i ' ' ' > 

X^„] X/) = [Xj3l ,XJ32, • • • ,X/)„] ^ , XaA = [x^^l ,X^^2, ' ' ' ,X^^„] ^, 
X^Z) = ,^AD2, • • • ,^ADn] > ^DA = [Xfl^l ,^DA2, ' ' ' ,^DAn] ) 

and X^/) = [xj)j)i,Xj)B2> • • • >Xz)z)K]^are the corresponding design 
matrices of difiFerent effects, and Ig is the residual error that follows 
a normal distribution with zero mean and variance (JqI. 



Given m markers, the size of matrix or X^) is nxm, and the 
size of 'X.aa, ^ad, ^da, or ILod is nxq, where q = m(m—\)/ 
2- 1,309,771. Defining P=\fiUlPlA'fAD'fiL'PW ^ and 
X=\Xa,Xj),Xaa,^ad,^da,^dd], we can write (2) in a more 
compact form: 

y = fi+Xp+e. (3) 

The size of matrix X is nxk, where A: = 2m+4(/ = 5,242,322, and 
we apparently have k»n. However, we would expect that most 
elements of P ait zeros and thus we have a sparse linear model. 
The Blasso model employs a three-level hierarchical prior 
distribution to model the sparsity. At the first level, 
\etPj,j=l,2,---,k, follows an independent normal distribution 

with mean zero and unknown variance aj : pjN(fi,aj). At the 
second level, let aj, j= 1, 2, k, follows an independent 
exponential distribution with a common parameter h 
p{(7j) = 2.exp{ — Aaj). At the third level, we assign a conjugate 
Gamma prior Gammdfi, b) with a shape parameter a and an inverse 
scale parameter b to the parameter X. Finally, we assign non- 
informative uniform priors to n and (7q. The three-level 
hierarchical model has two hyperparameters {a, b) for adjusting 
the degree of shrinkage, and cross validation (CV) can be apphed 
to choose appropriate values of these parameters. 
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The QTL model (2) or equivalently (3) includes all main effects 
and digenic interactions. We refer to this model as the full model 
throughout the paper. We also performed QTL mapping with the 
model y = ^i + XaP/i+^dPd which is referred to as the main 
effect model, since it includes only the main effects. 

Model inference and cross validation 

The Blasso model can be inferred efficiently with the empirical 
Blasso (EBlasso) algorithm [2 1] . The EBLasso algorithm employs a 
coordinate ascent method to find aj, the estimate of ,j = 0 k, 
that maximizes the likelihood function of aj,j=0, k. In the 
iterative process, many cpj or equivalent {ij are shrunk to zero. The 
coordinate ascent method along with other algorithmic techniques 
makes the EBlasso algorithm very efficient. Our previous studies 
demonstrated that EBlasso outperformed several other multiple 
QTL mapping methods including the empirical Bayes method 
[26], the Bayesian hierarchical generalized linear models 
(BhGLM) [27], HyperLasso [28], and Lasso [29]. Detailed 
description of the EBlasso algorithm can be found in [21,22] 
and an efficient C program with the R interface [30] implement- 
ing the EBlasso algorithm is available. 

The optimal values of two hyperparameters (a, A) of the EBLasso 
algorithm were obtained with five-fold CV in three steps to 
minimize the prediction error [PE] calculated from 

n 

PE= ^ ^ (y,: —yi) , where yi,i= 1, ■ ■ • is the estimated pheno- 

i=l 

typic value. In the first step, fl=A = 0.001, 0.01, 0.1, 1 were 
examined and a pair (flj, b\) corresponding to the smallest /"£ was 
obtained. In the second step, b was fixed at bi and a was chosen 
from the set [-0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, 
-0.1, -0.01, 0.01, 0.05, 0.1, 0.5, 1], which yielded a value 
corresponding to the smallest PE. In the third step, a = (Zj was fixed 
and b varied from 0.01 to 10 with a step size of one for b>\ and a 
step size of one on the logarithmic scale for b<\. Note that when 
fixing one of the two parameters, the degree of shrinkage is a 
monotonic function of the other parameter [21,22]. Therefore, in 
the second and third steps, the selection did not go through the fuU 
path but stopped if the current PE was one standard error larger 
than the minimum PE in previous steps. 

Statistical significance test 

One advantage of the EBLasso algorithm relative to Lasso [29] 
is that it not only outputs sl k' x 1 (k' «k) vector ^as an estimate of 
nonzero elements of /i, but also gives an estimate of the covariance 
of/?, S. Letting be thejth diagonal element of E, we can use the 

- 1/2 

i-statistics Pj/^ij to test if pj^O at a. certain significance level. 
Supporting Information 

File SI Tables S1-S12. Table SI. Cross-validation for 
determining hyperparameters [a, b) used in QTL mapping for 
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Table 8. Estimated QTL effects from the main effect model 
for yield per plant. 



locus' 


%)* 


p-value' 


'1^" 


181 add 


— 1 .87(0.42} 


6.24x10"^ 


0.0518 


1014 add 


-2.27(0.43) 


9.58x10"" 


0.0743 


1057 add 


-2.41(0.43) 


2.58x10"" 


0.0848 


1100 add 


1.17(0.39) 


1.35x10"' 


0.0205 


Parameter{s) 




a=-0.5, b = 0.1 








-0.1500 




"l 




26.2580 








0.2379 





"add: additive effect; dom: dominance effect. Total number of effects is 4, all 
with a p-value <0.01 . 

'^he estimated marker effect is denoted by ji and the standard deviation is 
denoted by .v^. 

*^P-value Is obtained via f-test. 
''Phenotypic variation explained. 
doi:l 0.1 371 /journal.pone.0087330.t008 
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